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The Message Passing Interface specification (MPI) defines a portable message-passing API 
used to program parallel computers. MPI programs manifest a number of challenges on what 
concerns correctness: sent and expected values in communications may not match, resulting 
in incorrect computations possibly leading to crashes; and programs may deadlock resulting 
in wasted resources. Existing tools are not completely satisfactory: model-checking does 
not scale with the number of processes; testing techniques wastes resources and are highly 
dependent on the quality of the test set. 

As an alternative, we present a prototype for a type-based approach to programming and 
verifying MPI-like programs against protocols. Protocols are written in a dependent type 
language designed so as to capture the most common primitives in MPI, incorporating, in 
addition, a form of primitive recursion and collective choice. Protocols are then translated 
into Why 3, a deductive software verification tool. Source code, in turn, is written in Why ML, 
the language of the Why3 platform, and checked against the protocol. Programs that pass 
verification are guaranteed to be communication safe and free from deadlocks. We verified 
several parallel programs from textbooks using our approach, and report on the outcome. 


1 Introduction 

Background Message Passing Interface (MPI) [6], a standardized and portable message¬ 
passing API, is the de facto standard for High Performance Computing. Some of the chal¬ 
lenges in developing correct MPI programs include: mismatches on exchanged values resulting 
in incorrect computations, and deadlocks resulting in wasted time and resources. 

High performance computing bugs are quite costly. High-end HPC centers cost hundreds 
of millions to commission. On many of these centers, over 3 million dollars are spent in elec¬ 
tricity costs alone each year and research teams apply for computer time through competitive 
proposals, spending years planning experiments [8]. A deadlocked program represents an exorbi¬ 
tant monetary cost, and such situations are hard to detect at runtime without resource wasting 
monitors. 

The formal verification of MPI programs employs different methodologies such as runtime 
verification mmmm and model checking or symbolic execution [pj [TDJ [T7, 20. [22]. Run¬ 
time verification, by its own nature, cannot guarantee the absence of faults. In addition, the 
process can become quite expensive due to the difficulty in producing meaningful tests. Model 
checking approaches typically face a scalability problem, since the verification state space grows 
exponentially with the number of processes. Verifying real-world applications may restrict the 
number of processes to only a few ED- 


Motivation To illustrate the problem we present a classic MPI example that solves the finite 
differences problem. 
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Finite differences is a numeric method for solving differential equations. The program starts 
with an initial solution Xq, and calculates X i. , X 3 ,... iteratively until a maximum number of 
iterations are executed. The problem vector is split amongst all processes, each calculating their 
part of the problem, and then joined at the end. The processes are setup in a ring topology as 
depicted below, in order to exchange the boundary values necessary for the calculation of the 
differences. 



In MPI, every process (or participant) is assigned a rank; processes have access to their ranks. 
Ranks start at 0 and end at size—1, where size is a constant that indicates the total number of 
processes. The number of processes is chosen by the user at launch time. 

MPI follows the Single Program Multiple Data (SMPD) model where all processes share the 
same code. Processes’ behaviour diverge from one another based on their rank (typically via 
conditional statements). This model makes deployment very simple, since a single piece of code 
can be run on all machines. 

Figure [l] shows an implementation of the finite differences problem in the C programming 
language (simplified for clarity). Line 2 initializes MPI, and is followed by calls to functions that 
return the current process rank (line 3) and the number of processes (line 4). The size of the input 
vector is broadcast (line 5), and the input vector at rank 0 is split and scattered (line 6) among 
all processes. Then, every processes iterates a certain number of times, exchanging messages 
with its left and right neighbors (lines 9-15) and calculating local values. Finally, process 0 
calcutates the global error, and gathers all the parts of the resulting vector (lines 16-17). 

This implementation deadlocks when using unbuffered communication. Process 0 attempts 
to send a message to its left neighbor, which in turn is attempting to send a message to its left 
neighbor, and so forth, with no process actually receiving the message. The correct implemen¬ 
tation of this example requires three separate cases: one for the first process, one for the last 
process, and a third for all the others. These cases have specific send/receive orders that are 
not at all obvious. 


Solution Our approach is inspired by multi-party session types pJGIj, where types describe 
protocols. Types describe not only the data exchanged in messages, but also the state transitions 
of the protocol and hence the allowable patterns of message exchanges. Programs that conform 
to well-formed protocols are communication-safe and free from deadlocks. Our approach makes 
it possible to statically verify that participants communicate in accordance with the protocol, 
thus guaranteeing the properties above. A novel notion of type equivalence allows to type source 
code for individual processes against the same (global) type. 

The general idea is as follows: first, a protocol is written in a type language designed for 
the purpose. A protocol compiler checks whether the protocol is well-formed and compiles it 
to a format that can be processed by a deductive program verifier. Parallel programs are then 
checked against the generated protocol. 
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int main(int argc, char** argv) { 

MPI_Init(&argc, &argv); 

MPI_Comm_rank(l' / IPI_COMM_WORLD, &rank); 

MPI_Comm_size(MPI_COMM_WORLD, &size); 

MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD); 

MPI_Scatter(data, n/size, MPI_FL0AT, &local[l], n/size, MPI_FL0AT, 0, MPI_COMFL WORLD); 
int left = rank == 0 ? size - 1 : rank - 1; 
int right = rank == size - 1 ? 0 : rank + 1; 
for (iter = 1; i <= ITERATIONS; iter++) { 

MPI_Send(&local[1], 1, MPI_FL0AT, left, 0, MPI_C0MM_W0RLD); 

MPI_Send(&local[n/size], 1, MPI_FL0AT, right, 0, MPI_C0MM_W0RLD); 

MPI_Recv(&local[n/size+1], 1, MPI_FL0AT, right, 0, MPI_C0MM_W0RLD, ^status); 
MPI_Recv(&local[0], 1, MPI_FL0AT, left, 0, MPI_C0MM_W0RLD, ^status); 

// Computation is performed here, removed for simplicity 

} 

MPI_Reduce(&localErr, &globalErr, 1, MPI_FL0AT, MPLMAX, 0, MPI_C0MM_W0RLD); 
MPI_Gather(&local[l], n/size, MPI_FL0AT, data, n/size, MPI_FL0AT, 0, MPI_C0MM_W0RLD); 
MPI_Finalize(); 

return 0; 


Figure 1: An excerpt of an incorrect implementation of the finite differences problem 


In line with all type-based approaches, our method requires writing a Par Type (a protocol) 
for the program. Such a type serves as further documentation for the program. In addition, we 
require a few program annotations to guide the verification tool. 

Method We developed a protocol language in the form of a dependent type language [23], 
The language includes the most common MPI-like communication primitives, in addition to 
sequential composition, primitive recursion, and collective choice. Protocols are then translated 
into Why3 ji] a deductive software verification platform that features a rich well-defined specifi¬ 
cation language called Why. On the other hand, source code is written in a high level language 
with first class support for parallel MPI-like primitives, namely WhyML, a language that is part 
of Why 3. 

Why3 allows the programmer to split verification conditions in parts and prove each part 
using a different Satisfiability Modulo Theories (SMT) solver. In cases where the solvers cannot 
handle part of the proof, Why3 can generate code for use with proof assistants like Coq [2]. 
We chose the Why3 platform in order to avoid the annotation overhead required for static 
verification of C or Fortran programs, the languages typically used to program in MPI m- 
This should be considered an experiment in a new programming methodology for developing 
reliable parallel applications, and not a tool for verifying existing MPI applications. 

Unlike other session type based approaches, our approach does not require explicit global- 
to-local protocol projection. This allows us to support not only MPMD programs, where the 
code for different ranks may be distinct, but also SPMD programs such as MPI-based ones. 

Figure [2] shows the parametrized multi-party session types approach [25). Global protocols 
are first projected into local protocols for each role, a communication pattern shared by one or 
more participants. In this case there are three roles, one for participant 0, one for participants 
1 to size-2, and finally one for participant size-1. 
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Figure 2: The multi-party session types approach 




(a) SPMD (b) MPMD 

Figure 3: The Par Types approach 


Figure [3] shows our approach for SPMD and MPMD. Unlike multi-party session types, the 
Par Types approach does not require a separate projection step: participants are verified directly 
against the global protocol. Furthermore, participants can be separate programs (MPMD), or 
single program (SPMD). 

Contributions The contributions of this work are: 

• A protocol compiler (in the form of an eclipse plugin), which verifies protocol formation 
and translates it into Why3; 

• A theory for protocols in Why3; 

• An MPI-like library for parallel programming in WhyML; 

• The verification of sample WhyML programs against protocols. 

Outline The following two sections present the protocol language and our Why3 library for 
parallel programming, detailing the verification workflow. After that we present the results we 
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protocol xpT 

protocol definition 

message HD 

point-to-point comm. 

broadcast i x: D 

broadcast operation 

scatter i D 

scatter operation 

gather i D 

gather operation 

reduce op D 

reduce operation 

allgather x: D 

allgather operation 

allreduce op x : D 

allreduce operation 

{T ... T} 

sequence 

foreach x\ i..iT 

repetition 

if pT else T 

collective choice 

val x: D 

value 

integer | float | D[] | {x: D \ p} \ ... 

datatypes 

x \ n\i + i\ ma x(i,i) length (i) i[i\ ... 

index terms 

true \i < i \ p and p\ a(i ,..., i) | ... 

index propositions 

max 1 min sum ... 

functions for reduce 


Figure 4: Protocol language grammar 


obtained when comparing our approach against a similar tool for the C programming language. 
After the related work section, we present our conclusions and pointers to future work. 

2 Protocol language 

In order to verify the finite differences example using our approach, we must first create a protocol 
the program must follow. The grammar for the protocol language is described in Figure |4j Not 
all protocols freely generated by the grammar are well formed. For instance, the from and to 
ranks of the message primitive must be distinct and lie between 0 and size — 1. Refer to [23j for 
details. 

The protocol for our running example is in Figure [5] Every protocol specification starts with 
the keyword protocol (line 1), followed by a protocol name, and a proposition (size >= 2) that 
describes the number of processes required. The variable representing the number of processes 
is called size, named after the MPI primitive (MPI_Size). The protocol requires two or more 
processes, in order to avoid having a single process sending messages to itself, which leads to a 
deadlock. This restriction can be omitted and will be inferred by the validator for simple cases. 

The protocol starts by specifying a global value, the maximum number of iterations. Global 
values are known by every process but are not exchanged by communication. They tipically 
represent some relevant constants hardwired in the code or present in the command line. Such 
values are introduced with the keyword val (line 2). The value is given a name, iterations , 
so that it may be further used. Types in the protocol language can be refined. The language 
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protocol FiniteDifferences (size >= 2) { 
val iterations: natural 

broadcast 0 n: {x: natural | x % size = 0} 
scatter 0 float[n] 
foreach iter: 0 .. iterations { 
foreach i: 0 .. size-1 { 

message i (size+i-l)%size float 
message i (i+l)%size float 

} 


} 

reduce 0 max float 
gather 0 float[n] 


Figure 5: Finite differences protocol 


supports some abbreviations for refined types, for example natural, which is any number greater 
than or equal to 0, is an abbreviation of {x: integer | x >= 0}. The type in line 3 could instead 
be written as: {x: {y: integer | y >= 0} | x % size = 0} or {x: integer | x >= 0 and x % 

size = 0}. 

All processes perform a broadcast operation, with process 0 (the root) sending the size of 
the work vector to every process (line 3), followed by a scatter operation splitting an array 
of size n (line 4) among all ranks. These are examples of collective operations, where there is 
a root process sending data, and every process (including itself) receiving. Type float [n] is 
an abbreviation of {x: float [] | length (x) = n}. Our language allows for the specification of 
restrictions on the size of the array and every integer value it contains. 

The reader may have noticed that, unlike broadcast, scatter does not introduce a variable. 
The reason for this is that the result of a scatter operation is not identical in all processes, hence 
could not be referred to in the rest of the protocol. 

The protocol then enters a foreach loop (lines 5-10). The foreach operation is not a com¬ 
munication primitive, but a variant of primitive recursion: a type constructor that expands its 
body for a given number of iterations. The inner foreach loop (lines 6-9) is used to specify the 
point to point communication of every process (to exchange boundary values). This constructor 
allows, for example, protocols to be parametric on the number of processes. Intuitively, if we 
were to expand the loop, it would result in the following message exchanges. 

message 0, size-1 float message 1, 2 float 

message 0, 1 float message 2, 1 float message size-1, size-2 float 

message 1, 0 float message 2 , 3 float message size-1, 0 float 

Process 0 first sends a message to the process on its left, then to the process on its right 

(lines 7-8). Process l does the same and so on until process size — 1. Note that the proto¬ 
col language does not require messages exchanges in the program to be globally sequential. 
Sequentiality restrictions happen on a per-process level. 

The rest of the protocol is simple, process 0 performs a reduce operation (line 11), where 
every process sends process 0 its local error, and then it calculates the global error, which is the 
maximum of all local errors. Finally, process 0 collects the results of every process with a gather 
operation (line 12), thus building the final solution. 
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Table 1: Foreach expanded for each rank 


i 

Loop body 


Rank 0 

Rank 1 

Rank size-2 

Rank size-1 

0 

message 

message 

0 size 

0 1 

-1 

send size-1 
send 1 

recv 0 


recv 0 

i 

message 

message 

1 0 

1 2 


recv 1 

send 0 
send 2 



2 

message 

message 

2 1 

2 3 



recv 2 




size-2 

message 

message 

size-2 

size-2 

size-3 
size-1 



send size-3 
send size-1 

recv size-1 

size-1 

message 

message 

size- 1 

size-1 

size-2 

0 

recv size-1 


recv size-1 

send size-2 
send 0 


The operations in the protocol are very close to those in the language, except that the 
language uses MPI_Send and MPI_Recv operations while the protocol uses message. Protocols 
provide a global point of view of the communication, while programs bear a local point of view 
of the communication. 

If we expand the foreach loop as a table of send and receive operations, we can easily see why 
the example in the introduction does not conform to the protocol: Table [l] shows the message 
passing pattern for every process, where we have omitted the type of the message (float) for 
conciseness. There are three different orderings of send and recv operations: one for process 0, 
one for process size — 1, and one for every other process. This sequencing is guaranteed not to 
deadlock. The example in the introduction does not match these projections. 


3 Why3 theory for protocols 

With the protocol out of the way, we concentrate on programming the algorithm. This ordering 
is not a requirement, the protocol could have been written by a different programmer, or both 
the protocol and the program could be developed simultaneously. 

To enable the verification of MPI programs with Why3, we developed two libraries: the 
Why3 theory for protocols that provides a representation for protocols as a Why3 datatype and 
the WhyML MPI library that replicates part of the MPI API with pre and post-conditions for 
the various communication primitives. 


Why3 theory for protocols The Why3 theory for protocols features a representation of 
protocols as a Why3 datatype (Figure [6]). Every datatype constructor either has a continuation 
or takes the rest of the protocol as a parameter (except for Skip, the empty protocol), unlike 
the protocol language where primitives are sequenced with the sequencing operator ({T ... T} 
in Figure [4|. Continuations are implemented using the HighOrd theory of Why3, to allow 
values from the program to be introduced into the protocol during verification. Protocols in 
Why3 format are generated from the protocol language by a translator that first checks the 
good formation of types generated by the grammar in Figure [4] Each datatype constructor 
corresponds to a type constructor. Primitives that introduce values use the continuation format, 
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type protocol = 

Val datatype continuation 
| Broadcast int datatype continuation 
| AllGather datatype continuation 
| AllReduce op datatype continuation 
| Scatter int datatype protocol 
| Gather int datatype protocol 
| Message int int datatype protocol 
| Reduce int op datatype protocol 
| If datatype protocol protocol protocol 
| Foreach int int (cont int) protocol 
| Skip 

with 

op = Max | Min | Sum | Prod | Land | Band | Lor | Bor | Lxor | Bxor 

with 

datatype = 

IntPred (pred int) 

| FloatPred (pred float) 

| ArraylntPred (pred (array int)) 

| ArrayFloatPred (pred (array float)) 

with 

continuation = 

IntAbs (cont int) 

| FloatAbs (cont float) 

| ArraylntAbs (cont (array int)) 

| ArrayFloatAbs (cont (array float)) 

with 

cont = func a protocol 


Figure 6: The Why3 datatype for protocols 


while those that do not feature the rest of the protocol as a parameter. Both kinds feature a 
datatype (D in Figure [dj representing the data exchanged. 

The type pred a used in the datatype constructors (lines 17-20) is an abbreviation of func a 
bool, a function of any type to a boolean. Such a predicate is used to restrict values, encoding 
refinement datatypes ({x: D \ p} in Figure [dj This type abbreviation is part of the Why3 stan¬ 
dard library. Similarly, the type cont a used in the continuation constructors (lines 23-26) is an 
abbreviation of func a protocol , a function of any type to a protocol (which is the continuation, 
used to introduce values into the protocol). 

WhyML MPI library The WhyML MPI library includes MPI-like primitives such as init, 
broadcast, scatter, gather, send, recv, as well as annotations foreach, expand, and isSkip required 
to guide the verification process. In order to check that the program follows the protocol, each 
Why3/MPI primitive is annotated with pre and post-conditions. 

The init primitive initializes the verification state, a structure used during the verification 
process. The verification state has a single mutable field containing the protocol datatype. 
Every ParTypes primitive takes the verification state as a parameter, verifies that the protocol 
is correct for the primitive and, if there are no errors, updates the protocol field of the verification 
state with the protocol continuation. Some of the ParTypes primitives and their annotations 
can be seen in Figure [7} 
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val send (dest: int) (v:’a) (s:state) : () 
writes { s.protocol } 
requires { 0 <= dest /\ dest < size } 
requires { match project s.protocol with 

| Message src dst d _ -> src = rank /\ dest = dst /\ matches v d 
| -> false 

end } 

ensures { s.protocol = next (old s).protocol } 

val apply (v:'a) (sistate) : 'a 
writes { s.protocol } 
requires { match project s.protocol with 
| Val d _ -> matches v d 

| -> false 

end } 

ensures { s.protocol = continuation (old s).protocol v } 
ensures { result = v } 

val foreach (s:state) : foreach_data 
writes { s.protocol } 
requires { match project s.protocol with 
| Foreach _ _ _ _ -> true 
| -> false 

end } 

ensures { s.protocol = next (old s).protocol } 
ensures { result = ( 

foreach_body (old s).protocol, 
foreach_from (old s).protocol, 
foreach_to (old s).protocol )} 

val expand (fd:foreach_data) (i: int ) : state 

requires { let low,high = fd in low <= i <= high } 
ensures { let body,^,_ = fd in result = {protocol = body i} } 

function project (p:protocol) : protocol = 

match p with 

| Message source dest _ remainingProtocol -> 
if rank <> source /\ rank <> dest 
then project remainingProtocol 

else p 

I - -> P 


Figure 7: Why ML MPI-like library (excerpt) 


Every ParTypes primitive calls the utility function project when verifying the protocol 
(lines 35-42). Obtaining the projection of a protocol yields the protocol itself in most cases. 
The exception is when the protocol is a message that is neither originating in nor addressed 
to the rank in question. In this case the function recurs, skipping messages unrelated to the 
current rank. Note that rank is not passed as a parameter, instead the rank constant is handled 
automatically by the Why3 verification process. 

The send primitive (lines 1-8), receives the target rank, a value, and the current state. It 
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let main () = 

let s = init fdiff_protocol in 

let iterations = apply 100000 s in 

let n = broadcast 0 input s in 

let local = scatter 0 work s in 

let left = if rank > 0 then rank-1 else size-1 in 

let right = if rank < size-1 then rank+1 else 0 in 

for iter = 1 to iterations do 

let inbody = expand (foreach s) iter in (* Annotation *) 
let body = foreach inbody in (* Annotation *) 
if (rank = 0) then ( 

let fl = expand body 0 in (* Annotation *) 
send left local[l] fl; 

send right local[n/size] fl; isSkip fl; (* Annotation *) 
let f2 = expand body 1 in (* Annotation *) 

local[n/size+1] <- recv right f2; isSkip f2; (* Annotation *) 
let f3 = expand body (np-1) in (* Annotation *) 
local[0] <- recv left f3; isSkip f3); (* Annotation *) 
else if (rank = size-1) then ( 

let fl = expand body 0 in (* Annotation *) 

local[lsize+1] <- recv right fl; isSkip fl; (* Annotation *) 

let f2 = expand body (np-2) in (* Annotation *) 

local[0] <- recv left f2; isSkip f2; (* Annotation *) 

let f3 = expand body (np-1) in (* Annotation *) 

send left local[l] f3; 

send right local[lsize] f3; isSkip f3); (* Annotation *) 

else ( 

let fl = expand body (rank-1) in (* Annotation *) 
local[0] <- recv left fl; isSkip fl; (* Annotation *) 
let f2 = expand body rank in (* Annotation *) 
send left local[l] f2; 

send right local[lsize] f2; isSkip f2; (* Annotation *) 
let f3 = expand body (rank+1) in (* Annotation *) 
local[lsize+1] <- recv right f3; isSkip f3); (* Annotation *) 
isSkip inbody; (* Annotation *) 

(* Computation is performed here, removed for simplicity *) 

done; 

globalerror := reduce 0 Max llocalerror s; 

gather 0 local s; 

isSkip s; (* Annotation *) 


Figure 8: The Why ML program for the corrected finite differences example 


checks that the destination is a valid process (line 3), that the current verification state starts 
with a Message after calling the project function (with the current rank as the source and the 
same destination as in the program), and that the value being sent matches the refinement 
(line 5). The send primitive returns nothing and ensures that the next state is the protocol after 
the message (line 8). 

The apply primitive is used to introduce program values into the protocol. It checks that the 
head of the protocol is a Val, and that the value introduced matches the restriction (line 13). 
The contract ensures that the protocol becomes the continuation of the Val constructor, after 
applying the value (line 16). Since a value is introduced, this primitive uses continuation instead 
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of next, which does not introduce values. 

Finally, the foreach primitive, requires that the protocol must have a Foreach constructor 
at the head (line 22). It ensures that the protocol continues with whatever follows Foreach 
(line 25), updates the verification state, and returns a triple containing the body of the foreach 
and its range (lines 26-29). The expand function takes the output of the foreach primitive and 
an integer i, checks that the integer is in range (line 33), and returns the projection of the 
Foreach for that integer (line 34). 

The Par Types primitives apply and foreach are not related to any MPI primitive, they are 
simply annotations required to match the program against a protocol. 

Checking Why ML code Figure [8] shows the finite differences example written in WhyML, 
following the three separate send/receive orderings. The necessary annotations are all marked 
with (* Annotation *). On line 2, the verification state is initialized with the finite differences 
protocol, the result of translating the protocol in Figure [5]into a Why3 protocol type (an instance 
of the type in Figure[6j not shown). On line 3, the apply primitive is used to introduce the number 
of iterations into the protocol, consuming the Val constructor at the head. The subsequent lines 
perform a broadcast and a scatter following the protocol, while consuming the corresponding 
constructors. 

On line 10 the foreach and expand primitives are called to obtain the body of the Foreach 
constructor, with the current iteration count applied. The Foreach body contains another Fore¬ 
ach loop (see Figure [5]), but that loop does not correspond to a loop in the program, instead, 
it is used to define the behavior of every process. There are three different send/recv orderings, 
following the message projections in Table [lj Each branch (lines 11-34) corresponds to one of 
the projections, with the Foreach constructor expanded for all intervening processes: the process 
on the left, the process itself, and the process on the right (lines 12, 15 and 17 for example). The 
isSkip function verifies that each of these projections is equivalent to Skip at that point in the 
code (lines 14, 16 and 18 for example), to guarantee the protocol is followed correctly (see [23] 
for the theory). The rest of the program is simple, with a final isSkip at the end (line 40) to 
guarantee the protocol was completely consumed. 

4 Evaluation 

We adapted a few classic parallel programming examples to WhyML, wrote their protocols, and 
checked them with Why3. To evaluate the results, we compared verification times and the ratio 
of annotations or of code. The closest work to ours checks programs written in C+MPI with 
VCC [13]. The experimental setup was a 2,4 GHz Intel Core 2 Duo machine running Windows 7 
with 4 GB of RAM. 

Sample programs We verified the following programs: 

• Pi: a simple program that calculates an approximation of pi through numerical integration, 
taken from [9:. 

• Finite differences: used as our running example. The code is adapted from [7j. 

• Parallel dot: calculates the dot product of two vectors, taken from [ 16] . 
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Program 

Why3 Sub-Proofs 

Why3 Time (s) 

VCC Time (s) 

Why3/VCC 

Pi 

27 

1,6 

2,4 

66,7% 

Finite differences 

374 

14,9 

16,1 

92,5% 

Parallel dot 

298 

7.9 

7,4 

106,7% 


Table 2: Results for Why3 and VCC verification times 


Program 

Why3 LOC 

Why3 Anot 

Ratio 

VCC LOC 

VCC Anot 

Why3/VCC 

Pi 

33 

6 

18% 

42 

10 

23% 

Finite differences 

86 

29 

33% 

128 

49 

38% 

Parallel dot 

61 

11 

18% 

99 

30 

30% 


Table 3: Results for Why3 and VCC annotation requirements 


Verification time The verification times obtained are in Table [2] (average of 10 runs). Though 
Why3 can be used interactively, all the runs were automated, and no manual proofs were nec¬ 
essary. As can be seen from the results, Why3 and VCC have similar performance. This is 
surprising as Why3 spawns a different Z3 process for each sub-proof. A possible explanation for 
the similarity is that each individual sub-proof is substantially easier on the solver, and VCC 
has to perform more verifications than Why3 due to concurrency and pointer related proofs. 

The results are promising, more so since proofs can be done in parts if necessary. 


Annotation effort The ratio for annotations can be seen in Table [3j The lines of code (LOC) 
count ignores library imports, comments, and empty lines. VCC requires more annotations than 
Why3 due to concurrency and pointer related annotations, but a lot of these can be automated 
with an annotator or by employing C macros. That said, something similar could be done 
for Why3. The only annotations the programmer would have to write would be foreach and 
collective choice marks, greatly reducing the effort required to use our verification methodology. 


5 Related Work 

Scribble m is a language to describe protocols for message-based programs based on the 
theory of multiparty session types m- Protocols written in Scribble include explicit senders 
and receivers, thus ensuring that all senders have a matching receiver and vice versa. Global 
protocols are projected into each of their participants’ counterparts, yielding one local protocol 
for each participant present in the global protocol. Developers can then implement programs 
based on the local protocols and using standard message-passing libraries, like Multiparty Session 

c E2- 

Pabble [33] is a parametric extension of Scribble, which adds indices to participants and rep¬ 
resents Scribble protocols in a compact and concise notation for parallel programming. Pabble 
protocols can represent the interaction patterns of scalable MPI programs, where the number 
of participants in a protocol is decided at runtime through parameters. 

In this work we depart from multiparty session types along two distinct dimensions: a) our 
protocol language is specifically built for MPI primitives, and b) we do not explicitly project a 
protocol but else check the conformance of code directly against a global protocol. 
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Tools for the verification of MPI programs The objectives of MPI verification tools are 
diverse and include the validation of arguments to MPI primitives as well as resource usage m , 
ensuring interaction properties such as the absence of deadlocks na 12a eh, or asserting func¬ 
tional equivalence to sequential programs |20j . The methodologies employed are also diverse, 
ranging from traditional dynamic analysis up to model checking and symbolic execution. In com¬ 
parison, our methodology is based on type checking and deductive program verification, thus 
avoiding testing and the state-explosion problem inherent to the model checking approaches 
described below. 

Model checking TASS [22] employs model checking and symbolic execution, but is also able 
to verify user-specified assertions for the interaction behavior of the program, so-called collective 
assertions, and to verify functional equivalence between MPI programs and their sequential 
counterparts. The approach performs a number of checks besides deadlock detection (such 
as buffer overflows and memory leaks), but, as expected, does not scale with the number of 
processes. 

MOPPER [5. is a verifier that detects deadlocks by checking formulae satisfiability, obtained 
by analyzing execution traces of MPI programs. It uses a propositional encoding of constraints 
and partial order reduction techniques, obtaining significant speedups when compared with ISP. 
The concept of parallel control-flow graphs [I] allows for the static and dynamic analysis of MPI 
programs, e.g., as a means to verify sender-receiver matching in MPI source code. 

CIVL [I2 a is a model checker that uses a C-like unified intermediate verification language for 
specifying concurrency in message-passing, multi-threaded, or GPU languages. The tool uses 
model checking techniques and symbolic execution to detect deadlocks, assertions and bounds 
violations, as well as illegal memory usages. The tool’s underlying CVC3 theorem prover may 
fail due to state space explosion, and the user needs to specify input bounds on the command 
line to specify a finite subset of state space. 

Runtime verification and testing ISP mi is a deadlock detection tool that explores all 
possible process interleaving using a fixed test harness. Dynamic execution analyzers, such as 
DAMPI [2U and MUST [TO], strive for the runtime detection of deadlocks and resource leaks. 

6 Conclusion 

We developed an eclipse plugin for the development and validation of protocols, and a pro¬ 
gramming language for the development of parallel programs by adding MPI-like primitives to 
WhyML. We also developed a Why3 theory of protocols for the verification of MPI-like WhyML 
programs. With this approach we can ensure that programs that pass the Why3 verification 
are free from deadlocks, all message exchanges are type safe, and the program adheres to the 
protocol. 

Unlike model checkers (such as TASS [[22]), our approach scales to any number of processes, 
running in constant time. No runtime verification of the software is necessary as in ISP HZ], 
DAMPI [21] or MUST [10]. These tools do not require protocols and typically require less 
program annotations, but the runtime verifiers require a good test suite which is much harder 
to write than a protocol. Unlike Scribble HD, our approach can model MPI-like programs, 
including collective choices without communication. 
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Previous work used VCC |3] to verify C+MPI programs. The approach is very similar to 
ours, but requires extra annotations regarding concurrency and pointers since VCC is a tool for 
verifying concurrent C programs. The annotations are also more complex, while in our approach 
are more natural and fit with the code. 

The foreach primitive annotation requires familiarity with how the foreach primitive is ex¬ 
panded, but writing correct programs already implies that sort of mental reasoning. Naive 
approaches most likely result in deadlocks, as we illustrated in the finite differences example 
(Figure [I]). The VCC based approach shares these problems. 

We successfully verified a number of textbook examples of parallel programs, with verifica¬ 
tions taking only a few seconds in the worst case, and none of the examples required manual 
proofs. 

Our prototype is based on a verification language, WhyML, that is not an appropriate 
language for industry use. While OCanrl programs can be extracted from WhyML, OCaml 
is not a language typically used in high performance computing. Performance is the major 
consideration in high performance computing, and Fortran and C are the fastest high-level 
languages available. To tackle these issues (including the annotation problem), an appropriate 
language should be developed. This language, like our WhyML based language, would have first 
class parallel programming primitives, but it would be essentially a C or Fortran superset with 
restrictions. This language would compile to either C or Fortran, and by having essentially the 
same semantics, performance should be the same. 

Finally, other MPI primitives need to be supported, such as asynchronous communication 
primitives, topologies, communicators and wild card receive. 
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